Multi-stream Confidence Analysis for Audio-Visual Affect Recognition
نویسندگان
چکیده
Changes in a speaker’s emotion are a fundamental component in human communication. Some emotions motivate human actions while others add deeper meaning and richness to human interactions. In this paper, we explore the development of a computing algorithm that uses audio and visual sensors to recognize a speaker's affective state. Within the framework of Multistream Hidden Markov Model (MHMM), we analyze audio and visual observations to detect 11 cognitive/emotive states. We investigate the use of individual modality confidence measures as a means of estimating weights when combining likelihoods in the audio-visual decision fusion. Person-independent experimental results from 20 subjects in 660 sequences suggest that the use of stream exponents estimated on training data results in classification accuracy improvement of audio-visual affect recognition.
منابع مشابه
Stream confidence estimation for audio-visual speech recognition
We investigate the use of single modality confidence measures as a means of estimating adaptive, local weights for improved audio-visual automatic speech recognition. We limit our work to the toy problem of audio-visual phonetic classification by means of a two-stream Gaussian mixture model (GMM), where each stream models the class conditional audioor visual-only observation probability, raised...
متن کاملProduct HMMs for audio-visual continuous speech recognition using facial animation parameters
The use of visual information in addition to acoustic can improve automatic speech recognition. In this paper we compare different approaches for audio-visual information integration and show how they affect automatic speech recognition performance. We utilize Facial Animation Parameters (FAPs), supported by the MPEG-4 standard for the visual representation as visual features. We use both Singl...
متن کاملStream weight estimation using higher order statistics in multi-modal speech recognition
In this paper, stream weight optimization for multi-modal speech recognition using audio information and visual information is examined. In a conventional multi-stream Hidden Markov Model (HMM) used in multi-modal speech recognition, a constraint in which the summation of audio and visual weight factors should be one is employed. This means balance between transition and observation probabiliti...
متن کاملUsing the Multi Stream Approach for Continuous Audio Visual Speech Recognition Experiments on the M Vts Database
The Multi Stream automatic speech recognition approach was investigated in this work as a framework for Au dio Visual data fusion and speech recognition This method presents many potential advantages for such a task It particularly allows for synchronous decoding of continuous speech while still allowing for some asynchrony of the visual and acoustic information streams First the Multi Stream f...
متن کاملAsynchrony modeling for audio-visual speech recognition
We investigate the use of multi-stream HMMs in the automatic recognition of audio-visual speech. Multi-stream HMMs allow the modeling of asynchrony between the audio and visual state sequences at a variety of levels (phone, syllable, word, etc.) and are equivalent to product, or composite, HMMs. In this paper, we consider such models synchronized at the phone boundary level, allowing various de...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005